Zoom in ggplot maps with Coordinate Reference
Systems (CRS) using the coord_sf()
function.
Change the CRS projection of a
ggplot map using the crs argument of
coord_sf().
Configure data with UTM coordinate system using
the st_as_sf() function.
Change the CRS projection of a sf
object using the st_transform() function.
This lesson requires the following packages:
if(!require('pacman')) install.packages('pacman')
pacman::p_load(malariaAtlas,
colorspace,
ggplot2,
cholera,
spData,
here,
rio,
sf)
pacman::p_load_gh("afrimapr/afrilearndata",
"wmgeolab/rgeoboundaries")From the second lesson, we learned that Vector data require a Coordinate Reference System (CRS) to relate the spatial elements of the data with the surface of Earth. For that reason, Coordinate systems are a key component of geographic objects.
Figure 1. Direction of longitude and latitude.
In this lesson we are going to learn how to manage the CRS of maps by zooming in to an area of interest, set them up to external data with coordinates different to longitude and latitude (called UTM), and transform between different coordinate systems of CRS’s.
First, let us start with creating a base map of the
world from the spData package using
ggplot2:
ggplot(data = world) +
geom_sf()coord_sf()The function coord_sf() from the {ggplot2}
package allows to deal with the coordinate system,
which includes both the extent and projection of a
map.
The extent of the map can also be set in
coord_sf(), in practice allowing to “zoom” in the area of
interest, provided by limits on the x-axis (xlim), and on
the y-axis (ylim).
Here, we zoom in the world map to the African continent,
which is in an area delimited in longitude between 20°W
and 55°E, and in latitude between 35°S and 40°N. To
exactly match the limits provided, use expand = FALSE.
ggplot(data = world) +
geom_sf() +
coord_sf(xlim = c(-20, 55), ylim = c(-35, 40), expand = FALSE)Check which + and - signs
are related with the cardinal direction:
-, East is +,-, North is +.Zoom in the africountries map to Sierra Leona, which is
in an area delimited in longitude between 14°W and 10°W, and in latitude
between 6°N and 10°N.
The world object have a CRS projection
called WGS84 (detailed in the fifth line of
the header)
world## Geometry set for 177 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -89.9 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 5 geometries:
Which corresponds to the EPSG code 4326:
st_crs(world)$input## [1] "EPSG:4326"
Projection refers to the mathematical equation that was used to project the truly round earth onto a flat surface.
Figure 2. Projection of a CRS.
EPSG refers to the European Petroleum Survey Group (EPSG).
EPSG is a Spatial Reference System Identifier (SRID) with arbitrary codes available for concrete CRS projections.
One of these projections is WGS84, which refers to World Geodetic System 1984.
Using the crs argument of the coord_sf()
function, it is possible to override this setting, and
project on the fly to any projection.
For example, we can change the current WGS84 projection to the ETRS89 Lambert Azimuthal Equal-Area projection (alias LAEA), which is EPSG code 3035:
ggplot(data = world) +
geom_sf() +
coord_sf(crs = 3035)However, this CRS projection is useful for European countries.
Get into the linked webpage to find the EPSG code that its required:
Change the CRS projection of the ggplot map with the
world object to the Pseudo-Mercator coordinate system.
Which projection should I use?
To decide if a projection is right for your data, answer these questions:
Take the time to identify a projection that is suited for your project. You don’t have to stick to the ones that are popular.
Online tools like Projection Wizard can also help you discover projections that might be a better fit for your data.
Figure 3. Steps to find a custom projection with Project Wizard.
For instance, to find an appropriate projection for the African continent, you can:
Then, paste that valid PROJ string to the
crs argument:
ggplot(data = world) +
geom_sf() +
coord_sf(crs = "+proj=laea +lon_0=19.6875 +lat_0=0 +datum=WGS84 +units=m +no_defs")PROJ is an open-source library for storing, representing and transforming CRS information.
Get into the linked webpage to find the PROJ string that its required:
Change the CRS projection of the ggplot map with the
world object to the Aitoff
coordinate system.
A CRS has a few key components:
Coordinate System - There are many many different coordinate systems, so make sure you know which system your coordinates are from. (e.g. longitude/latitude, which is the most common);
Units - Know what the units are for your coordinate system (e.g. decimal degrees, meters);
Datum - A particular modeled version of the Earth. These have been revised over the years, so ensure that your map layers are using the same datum. (e.g. WGS84);
Projections - As defined above, it refers to the mathematical equation that was used to project the truly round earth onto a flat surface.
The “orange peel” analogy is useful to understand projections. If you imagine that the earth is an orange, how you peel it and then flatten the peel is similar to how projections get made.
Figure 4. Image of citrus.
Figure 5. Image of peeled orange with globe.
Figure 6. Maps of the United States in different projections (Source: opennews.org)
The above image shows maps of the United States in different projections. Notice the differences in shape associated with each projection. These differences are a direct result of the calculations used to flatten the data onto a 2-dimensional map.
Data from the same location but saved in different projections will not line up in any GIS software.
Thus, it’s important when working with spatial data to identify the coordinate reference system applied to the data and retain it throughout data processing and analysis.
Let’s say that you receive a data frame with coordinates, but
without a CRS projection. If you want to make a
ggplot map with it, you know that you can use
st_as_sf() from the {sf} package.
As we learned in a previous lesson, for point data we need to specify the coordinates and the CRS:
fatalities %>%
st_as_sf(coords = c("x","y"),
crs = 4326) %>%
ggplot() +
geom_sf(alpha = 0.3)However, what if you receive coordinates different to longitude and latitude?
To exemplify this scenario, we are going to use malaria prevalence in The Gambia. We use data of malaria prevalence in children obtained at 65 villages in The Gambia.
First, we download the gambia.rda file from internet
using its URL path with the {rio} package.
gambia <- rio::import("https://github.com/cran/geoR/raw/master/data/gambia.rda")
as_tibble(gambia)## # A tibble: 2,035 × 8
## x y pos age netuse treated green phc
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 349631. 1458055 1 1783 0 0 40.8 1
## 2 349631. 1458055 0 404 1 0 40.8 1
## 3 349631. 1458055 0 452 1 0 40.8 1
## 4 349631. 1458055 1 566 1 0 40.8 1
## 5 349631. 1458055 0 598 1 0 40.8 1
## 6 349631. 1458055 1 590 1 0 40.8 1
## 7 349631. 1458055 1 589 1 0 40.8 1
## 8 349631. 1458055 0 609 1 0 40.8 1
## 9 349631. 1458055 0 791 1 0 40.8 1
## 10 349631. 1458055 1 829 0 0 40.8 1
## # … with 2,025 more rows
It is a data frame with 2035 observations and 8 variables, among them:
x: x coordinate of the village
(UTM),y: y coordinate of the village
(UTM),pos: presence (1) or absence (0) of malaria in a blood
sample taken from the child,UTM stants for Universal Transverse Mercator, another coordinate system.
Data in gambia are given at an individual
level. To get an estimate of the prevalence per village, we
need to aggregate the malaria tests by
village.
Using the dplyr::distinct() function, we can see that
gambia has 2035 rows and 65 unique pair of
coordinates. This indicates that 2035 malaria tests were conducted at 65
locations.
gambia %>%
distinct(x,y) %>%
nrow()## [1] 65
group_by() and
summarise()Here we use dplyr to create a data frame called
gambia_point_summary containing, for each
village the:
gambia_point_summary <-
gambia %>%
group_by(x, y) %>%
summarize(
total = n(),
positive = sum(pos),
prev = positive / total
) %>%
ungroup()gambia_point_summary## # A tibble: 65 × 5
## x y total positive prev
## <dbl> <dbl> <int> <dbl> <dbl>
## 1 349631. 1458055 33 17 0.515
## 2 358543. 1460112 63 19 0.302
## 3 360308. 1460026 17 7 0.412
## 4 363796. 1496919 24 8 0.333
## 5 366400. 1460248 26 10 0.385
## 6 366688. 1463002 18 7 0.389
## 7 368420. 1460569 38 24 0.632
## 8 370400. 1460301 56 7 0.125
## 9 370628. 1499589 57 6 0.105
## 10 374403. 1501392 26 8 0.308
## # … with 55 more rows
Now we can plot the malaria prevalence. However, in order to use
ggplot2 and geom_sf(), we need to transform
the data.frame to an sf object.
To use the st_as_sf() function from the
{sf} package, we need to specify a CRS projection. But in
this case, the units of the x and
y variables are not in Geographic
coordinates (longitude/latitude). Instead, these data coordinates are in
UTM format (Easting/Northing), also called
Projected coordinates.
CRS coordinate systems:
Geographic (or unprojected) reference systems use longitude and latitude for referencing a location on the Earth’s three-dimensional ellipsoid surface.
Projected coordinate reference systems use easting and northing Cartesian coordinates for referencing a location on a two-dimensional representation of the Earth.
Figure 7. Coordinate systems. Left: Geographic, Right: Projected.
All Projected CRSs are based on a Geographic CRS and rely on map projections to convert the three-dimensional surface of the Earth into Easting and Northing (x and y) values in a projected CRS.
Figure 8. Coordinate systems. Left: Geographic, Right: Projected.
Which of the following options of Coordinate Reference System (CRS) types:
"geographic_crs""projected_crs"…corresponds to each of these datasets, given the magnitude of the
values in their x and y columns:
the parana dataset?
parana <- import("https://github.com/cran/geoR/raw/master/data/parana.rda")
as_tibble(parana$coords)the fatalities dataset?
pacman::p_load(cholera)
as_tibble(fatalities)st_as_sf()First, we need to set UTM coordinates. For this, we
specify the projection of The Gambia, that is, UTM zone
28 ("+proj=utm +zone=28") in the
st_as_sf() function of the {sf} package.
gambia_projected <- gambia_point_summary %>%
# first, specify the projection of gambia
# UTM zone 28
st_as_sf(coords = c("x", "y"),
crs = "+proj=utm +zone=28")
gambia_projected## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 349631.3 ymin: 1458055 xmax: 622086.1 ymax: 1510610
## CRS: +proj=utm +zone=28
## # A tibble: 65 × 4
## total positive prev geometry
## * <int> <dbl> <dbl> <POINT [m]>
## 1 33 17 0.515 (349631.3 1458055)
## 2 63 19 0.302 (358543.1 1460112)
## 3 17 7 0.412 (360308.1 1460026)
## 4 24 8 0.333 (363795.7 1496919)
## 5 26 10 0.385 (366400.5 1460248)
## 6 18 7 0.389 (366687.5 1463002)
## 7 38 24 0.632 (368420.5 1460569)
## 8 56 7 0.125 (370399.5 1460301)
## 9 57 6 0.105 (370627.7 1499589)
## 10 26 8 0.308 (374402.6 1501392)
## # … with 55 more rows
Confirm the presence of the:
CRS: +proj=utm +zone=28) inside the
header of the new sf object, andgeometry column in
meters (<POINT [m]>).The UTM system divides the Earth into 60 zones of 6 degrees of longitude in width. Each of the zones uses a transverse Mercator projection that maps a region of large north-south extent.
To get the UTM zones of various parts of the world, you could use online interactive maps, or gridded images available in wikipedia.
In the UTM system, a position on the Earth is given by the:
parana_data contains the average rainfall over different
years for the period May-June (dry-season). It was collected at 143
recording stations throughout Parana State, Brasil.
Set UTM coordinate system to the parana_data.
parana_data <- as_tibble(parana$coords) %>%
mutate(Rainfall = parana$data)st_transform()We can transform the UTM projected coordinates to
geographic coordinates (longitude/latitude
with datum WGS84) using st_transform()
where we set CRS to "+proj=longlat +datum=WGS84".
gambia_geographic <- gambia_projected %>%
# second, transform
# projected coordinates to
# geographic coordinates
st_transform(crs = "+proj=longlat +datum=WGS84")
gambia_geographic## Simple feature collection with 65 features and 3 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -16.38755 ymin: 13.18541 xmax: -13.87273 ymax: 13.66438
## CRS: +proj=longlat +datum=WGS84
## # A tibble: 65 × 4
## total positive prev geometry
## * <int> <dbl> <dbl> <POINT [°]>
## 1 33 17 0.515 (-16.38755 13.18541)
## 2 63 19 0.302 (-16.30543 13.20444)
## 3 17 7 0.412 (-16.28914 13.20374)
## 4 24 8 0.333 (-16.25869 13.53742)
## 5 26 10 0.385 (-16.23294 13.20603)
## 6 18 7 0.389 (-16.23041 13.23094)
## 7 38 24 0.632 (-16.21431 13.20902)
## 8 56 7 0.125 (-16.19604 13.20668)
## 9 57 6 0.105 (-16.19569 13.56187)
## 10 26 8 0.308 (-16.16088 13.57834)
## # … with 55 more rows
Confirm the update of the:
CRS: +proj=longlat +datum=WGS84 inside the
header, andgeometry column to
degrees (<POINT [°]>).A PROJ string includes the following information:
+proj=: the projection of the data
(e.g. utm, longlat, or laea)+zone=: the zone of the data, specific to the UTM
projection (e.g. 28)+datum=: the datum use (e.g. WGS84)+units=: the units for the coordinates of the data
(e.g. m)With the UTM coordinate system data stored in q6:
Transform its Projected CRS to a Geographic CRS using the
longitude/latitude projection with datum
WGS84.
Now that you have set up the right CRS projection to your data, you can overlap these points with other Vector data objects:
gambia_adm_2 <- geoboundaries(country = "Gambia", adm_lvl = 2)ggplot() +
geom_sf(data = gambia_adm_2) +
geom_sf(data = gambia_geographic, mapping = aes(color = prev)) +
colorspace::scale_color_continuous_sequential(palette="Reds 3")Which CRS to use?
“There exist no all-purpose projections, all involve distortion when far from the center of the specified frame” (Bivand, Pebesma, and Gómez-Rubio 2013).
In this lesson, we have learned how to manage a CRS
projection in ggplot maps and sf
objects, how projections are codified with
EPSG codes and PROJ strings, and how to
transform the CRS between different coordinate
systems (projected and geographic).
In the next lesson, we are going to use all our previous learning to built one single thematic map by layers, and enrich them with text and labels referring to specific places or regions, and important map elements like scale bars and a north arrow!
The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
Moreno, M., Basille, M. Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 1: Basics. (2018). Retrieved 10 May 2022, from https://r-spatial.org/r/2018/10/25/ggplot2-sf.html
Data carpentry. Introduction to Geospatial Concepts: Coordinate Reference Systems. (2021). Retrieved 15 May 2022, from https://datacarpentry.org/organization-geospatial/03-crs/index.html
Moraga, Paula. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapter 9: Spatial modeling of geostatistical data. Malaria in The Gambia. (2019). Retrieved 10 May 2022, from https://www.paulamoraga.com/book-geospatial/sec-geostatisticaldataexamplespatial.html
Carrasco-Escobar, G., Barja, A., Quispe, J. [Visualization and Analysis of Spatial Data in Public Health]. (2021). Retrieved 15 May 2022, from https://www.reconlearn.org/post/spatial-analysis-1-spanish.html